Building a BNPL Platform From Scratch: What I Learned Building a Fintech Microservices Project
I spent the last few months building a Buy Now, Pay Later platform as a portfolio project, targeting a Sezzle-style job description that asked for Go, React/TypeScript, Elasticsearch, and solid API design. This post covers everything I built and — more importantly — the reasoning behind each decision. If you're learning backend or fullstack development, I hope this saves you some of the head-scratching I went through.
The project is split into two services:
-
BNPL Core Engine — Go REST API on port 8080, Postgres, Docker
-
Merchant Dashboard — React/TypeScript frontend (port 3000), Go API backend (port 8081), Elasticsearch
Here's the architecture at a glance:
Browser (React)
│
▼
Merchant API (:8081) ──── Elasticsearch (:9200)
│ (search index)
│
▼
Postgres (:5432) ◄──── BNPL Engine (:8080)
(source of truth) (creates orders,
processes payments)
Let's walk through how I built it phase by phase.
Phase 1: The BNPL Core Engine
Never Use Floats for Money
This was the first thing that genuinely surprised me when I started reading about fintech. I knew floats had precision issues in theory, but I had no idea how bad it actually is in practice:
0.1 + 0.2 = 0.30000000000000004
That's not a language bug. That's IEEE 754 floating point — the standard used by virtually every language's float64 type. The number 0.1 literally cannot be represented in binary floating point, the same way 1/3 can't be represented in decimal without repeating forever.
In fintech, this compounds. Splitting $100.01 four ways with floats gives you four numbers that each look correct individually, but when you sum them back up you're off by a cent. Multiply that across millions of transactions and you've got a real problem.
The fix is straightforward: store money as integer cents. $100.01 becomes 10001. Then:
10001 / 4 = 2500, remainder 1
Exact. No rounding. Every real payment company — Stripe, Square, Sezzle — does this.
The Payment Splitting Algorithm
Once I understood the integer cents approach, the splitting algorithm followed naturally. SplitPayment(10001, 4) should return [2501, 2500, 2500, 2500].
1. base = totalCents / numInstallments (integer division)
2. remainder = totalCents % numInstallments
3. First `remainder` installments each get base + 1
4. Remaining installments get base
The mathematical invariant: the sum of all installments always equals the original total. No lost cents, no rounding drift.
I proved this with a fuzz test in payment_plan_test.go that checks 12,000 combinations of different totals and installment counts. Every single one satisfies the invariant. Writing that test was when the value of fuzz testing clicked for me — you're not checking one case, you're exploring the entire input space.
Database Design: Defense in Depth
I put business logic constraints in the database itself, not just in application code:
-
CHECK (total_cents > 0)— the database rejects invalid data even if a bug in my Go code somehow passes a negative amount through -
UNIQUE(order_id, installment_num)— duplicate installments are impossible at the storage layer -
Foreign keys — you can't create an order for a merchant that doesn't exist
-
CREATE INDEX idx_orders_merchant_id— pagination queries on a merchant's orders stay fast as data grows
The principle is defense in depth. Application code has bugs. Database constraints are the last line of defense, and they should be there regardless.
Transactions: The Only Correct Way to Create an Order
When creating an order, I need to insert the order row and all four installment rows. If installment 3 fails mid-insert, what do I have in the database? An order with two installments — corrupt, impossible-to-use state.
The fix is wrapping the entire operation in a single BEGIN/COMMIT transaction. All-or-nothing. If anything fails, the whole thing rolls back. This seems obvious in retrospect, but before I built this I had never thought carefully about what "atomic" really means in a financial context.
SELECT FOR UPDATE: Race Conditions Are Not Theoretical
This one took me a while to fully grasp. The payment endpoint needs to check whether an installment is in pending state before marking it as paid. Simple, right?
Not if two simultaneous payment requests hit the same installment:
Request A: SELECT status FROM installments WHERE id = $1 → 'pending'
Request B: SELECT status FROM installments WHERE id = $1 → 'pending'
Request A: UPDATE installments SET status = 'paid' ← succeeds
Request B: UPDATE installments SET status = 'paid' ← also succeeds!
Double-charged. The customer pays twice.
SELECT ... FOR UPDATE locks the row at read time. Request B blocks at the SELECT until Request A commits, then reads status = 'paid' and returns an error. No double charge.
Race conditions in toy projects feel theoretical. In a payment system with real concurrency, they're not.
Testing at Three Levels
I ended up with 24 tests structured at three levels, and having all three turned out to be important:
Unit tests (payment_plan_test.go): Pure math, no database, run in milliseconds. The fuzz test lives here. These are the ones I run constantly while iterating.
Integration tests (repository_test.go): Hit real Postgres in Docker. Verify that my SQL queries, transactions, and foreign key constraints work the way I think they do. You cannot trust that your SQL is correct without running it against a real database.
End-to-end tests (handler_test.go): Real HTTP requests through the full stack — chi router → handler → service → repository → Postgres → HTTP response. These test double-payment rejection, wrong-amount rejection, and the full create-then-pay flow. If I break something in the handler layer, the unit tests don't catch it. The E2E tests do.
Tech Stack for Phase 1
-
Go with chi router (lightweight, idiomatic, great middleware support)
-
pgx/v5 for Postgres with pgxpool for connection pooling
-
Docker Compose with a healthcheck that waits for Postgres to be actually ready, not just started — a distinction that bit me early on when my app tried to connect before Postgres finished initializing
-
Graceful shutdown on SIGINT: stops accepting new connections but finishes in-flight requests before exiting
Phase 2: The Merchant Dashboard Backend
Shared Database: A Deliberate Tradeoff
I had two options for the merchant API's data layer:
Option A: Shared Postgres — both services read/write the same database.
Option B: Separate databases — requires syncing state via HTTP calls, polling, or a message queue like Kafka.
I went with Option A, but with a deliberate boundary: separate schemas. The BNPL engine owns the public schema (merchants, customers, orders, installments). The merchant API gets a merchant schema for tables it owns (future things like refresh tokens).
The merchant API reads from public.merchants for authentication but never writes there. This makes the ownership boundary visible. If I ever need to split to separate databases, I know exactly what belongs where.
This felt like the right call for a portfolio project and probably for an early-stage product. The Kafka route is genuinely complex to operate correctly.
JWT Authentication: Why Not Sessions?
In a microservices setup, session auth requires a shared session store — typically Redis — that every service calls to validate a session. That's another dependency, another point of failure, another thing to keep in sync.
JWTs are self-contained. Any service that knows the signing secret can validate a token without calling anything else. Stateless services are easier to scale and easier to reason about.
The login flow:
-
Look up merchant by email in
public.merchants -
bcrypt.CompareHashAndPassword— bcrypt is intentionally slow, around 100ms at cost 10. A GPU can brute-force MD5 hashes in seconds; bcrypt takes years. It's also constant-time, which prevents timing attacks where an attacker learns whether an email exists based on response time. -
Issue an HS256 JWT with
merchant_id,name, andemailin the claims. 24-hour TTL.
Security detail I learned: always return the same 401 Unauthorized error whether the email doesn't exist or the password is wrong. Differentiating them lets attackers enumerate valid email addresses.
The alg:none attack is one I hadn't heard of before. CVE-2015-9235. An attacker crafts a JWT with "alg": "none" — and some JWT libraries accept this as valid because they interpret it as "no signature required." The fix is checking token.Method.(*jwt.SigningMethodHMAC) in your middleware, explicitly rejecting anything that isn't HMAC. I added this to the middleware after reading about it.
The honest JWT tradeoff: you can't invalidate a token before it expires. Compromised account? That token is valid until ExpiresAt. Solutions exist — token blocklists, short TTL + refresh tokens — but they add complexity. For this project, 24-hour TTL is the acceptable middle ground.
Elasticsearch: Why Not Just LIKE?
This one was genuinely illuminating. My first instinct was WHERE customer_name LIKE '%alice%'. It works. Then I learned what actually happens under the hood.
LIKE '%alice%' with a leading wildcard does a full table scan. It reads every row in the orders table. With 10 million orders, that's unusable — a query that takes minutes, not milliseconds.
Elasticsearch builds an inverted index, which I now think of like the index at the back of a book. The book doesn't scan every page to find "database" — it looks in the index and immediately knows it's on pages 42, 156, and 203. ES does the same: it pre-processes every value at write time, so reads are instant regardless of dataset size.
Beyond speed, Elasticsearch handles things Postgres LIKE can't:
-
Fuzzy matching:
"Alce"matches"Alice"via edit distance. Handles typos. -
Relevance scoring: results ranked by how well they match
-
Partial matches: finds
"alice"in"Alice Johnson"
The text vs keyword distinction confused me for a while. In ES index mappings:
-
"text": ES tokenizes the value — splits on whitespace, lowercases."Alice Johnson"is indexed as["alice", "johnson"]. You can search for "alice" and find it. But you can't sort by a text field reliably. -
"keyword": stored as-is."Alice Johnson"only matches the exact string"Alice Johnson". Sortable, filterable, aggregatable.
My mapping uses text for customer_name (full-text search) and keyword for merchant_id and status (exact filters). Once I understood the distinction, the design was obvious.
The search query uses Elasticsearch's bool query with two clauses:
-
must+matchoncustomer_namewithfuzziness: AUTO— full-text, affects relevance scoring -
filter+termonmerchant_id— exact match, no scoring, cacheable by ES
On startup, the merchant API reads all orders from Postgres and upserts them into ES using the order UUID as the document ID. Idempotent — run it ten times, same result. Runs in a background goroutine so the server doesn't block on startup.
Narrow Interfaces in Go
One Go pattern that clicked for me during this phase:
type merchantRepo interface {
GetMerchantByEmail(ctx context.Context, email string) (model.Merchant, error)
}
The auth service declares only the methods it actually needs, not the full Repository struct. "Accept interfaces, return concrete types" is idiomatic Go for good reason — it makes the dependencies explicit and makes testing trivial. To test the auth service, I pass in a struct that implements just that one method. No mocking framework needed.
Phase 3: The React Frontend
Vite Over Create React App
Create React App is unmaintained at this point. Vite is the modern standard. The key difference in development: Vite uses native ES modules. The browser loads files individually via import statements — no bundling during dev. Changes appear in the browser in milliseconds via HMR. In production, it uses Rollup for an optimized bundle.
The Vite proxy configuration was one of those small things that makes everything cleaner:
proxy: {
'/api': { target: 'http://localhost:8081', changeOrigin: true }
}
fetch('/api/v1/auth/login') — no hostname. Vite intercepts and forwards to localhost:8081. The browser sees a same-origin request. No CORS headers needed in development. In production, Nginx does the exact same job.
JWT in Memory, Not localStorage
localStorage is accessible to any JavaScript on the page. One XSS vulnerability — one bad npm package, one injected script — and every user's token is stolen. Storing the JWT in React state means:
-
An XSS attack can't read it (no DOM access to React's memory)
-
Trade-off: token disappears on page refresh, user has to log in again
The production-correct approach is an httpOnly cookie — the browser sends it automatically with every request, but JavaScript can't read it at all, and it survives refresh. That requires backend changes (Set-Cookie header) which I've noted as a next step.
The AuthContext Pattern
JWT stored in React useState. Any component in the tree calls useAuth() to get the current token. A ProtectedRoute component checks for the token and redirects to /login if it's absent. Protected pages don't touch JWT at all — they just render. Clean separation.
Debounce with useRef
The SearchBar debounces the input 300ms before calling the search API. Without debouncing, typing "alice" fires five requests: a, al, ali, alic, alice. Four of those are wasted. With debouncing, one request fires 300ms after you stop typing.
The implementation detail that mattered: the timer is stored in useRef, not useState. Changing a ref doesn't trigger a re-render. If I had used useState for the timer, every keystroke would cause two re-renders — one on keypress, one on state update. With useRef, the component re-renders exactly when it should: when the debounced search result comes back.
This isn't just a performance optimization. It's the difference between a UI that feels snappy and one that feels laggy.
TypeScript API Types
I defined API response types manually to mirror the Go models:
interface Order {
id: string;
merchant_id: string;
customer_name: string;
total_cents: number;
status: string;
created_at: string;
}
If the Go API renames a field, TypeScript immediately shows errors at every call site. This is one of those things that feels like extra work upfront and saves you hours of debugging later.
Error Boundary
One of the remaining legitimate uses for class components in React. Catches any render error in the component subtree and shows a fallback instead of a blank white screen. getDerivedStateFromError + componentDidCatch. There's no hook equivalent — React hasn't added one yet.
Phase 4: Containerization
Multi-Stage Docker Builds
Before I learned about multi-stage builds, I had a ~400MB Docker image because the Go compiler was in the final image. Multi-stage builds fix this elegantly.
Stage 1 (build): Full Go compiler image. Compiles the binary with CGO_ENABLED=0 (produces a statically linked binary, no external library dependencies) and -ldflags="-w -s" (strips debug info, shrinks the binary ~30%).
Stage 2 (run): Alpine Linux (~5MB) + the compiled binary (~10MB) = ~15MB final image.
The compiler never reaches production. Smaller image means smaller attack surface, faster pushes, faster pulls in CI.
For the frontend: Stage 1 uses Node to run npm install && npm run build, producing static HTML/JS/CSS in /dist. Stage 2 copies that into Nginx Alpine. Final image is ~40MB.
Nginx for the Frontend
Nginx serves the React build and proxies /api/* to the merchant-api container. Docker's internal DNS resolves merchant-api to the container's IP automatically.
The line I almost forgot:
try_files $uri $uri/ /index.html;
Without this, refreshing /dashboard returns a 404. Nginx looks for a file literally named dashboard on disk, finds nothing, and errors. This directive tells Nginx: if the file doesn't exist, serve index.html and let React Router handle the URL. Required for every React Router app in production.
Static asset caching: Vite includes a content hash in every JS/CSS filename — something like index-BcY7R72f.js. Nginx sets a 1-year cache with immutable. When you deploy and the hash changes, browsers automatically fetch the new file. Cache busting for free.
The Migration Service Pattern
A dedicated Docker Compose service runs SQL migrations against Postgres and then exits. App services declare depends_on: condition: service_completed_successfully.
I almost ran migrations from within the app on startup. The problem: with multiple replicas, they'd race to CREATE TABLE. First one wins, the rest crash with "table already exists." A single migration service runs once, sequentially, before any app starts.
The Full Dependency Graph
postgres (healthy) ──► migrate (completed) ──► bnpl-engine
└──► merchant-api ◄── elasticsearch (healthy)
└──► merchant-dashboard
Docker Compose depends_on with health conditions means this graph is enforced automatically. Services don't start until their dependencies are actually ready — not just started, actually healthy.
Bonus: How I'd Convert This to a Desktop App
After building the full web stack, the natural question is: what if a merchant wants a native desktop app instead of opening a browser? Here's how I'd approach it.
My choice: Tauri. Tauri is a Rust framework that uses the OS's native WebView (Edge on Windows, WebKit on macOS) to render your existing React app as a desktop window.
Why Tauri over Electron:
-
Electron bundles Chromium (~150MB). Tauri uses the OS's existing webview — final binary is 3-10MB.
-
Electron gives the frontend full Node.js access. Tauri requires the frontend to declare exactly which OS APIs it needs. Smaller attack surface by design.
-
Memory-safe Rust backend with near-zero runtime overhead.
The migration path from our web app is minimal:
-
Keep the React frontend exactly as-is
-
Add a
src-tauri/directory with the Rust config -
Instead of Vite proxying
/apito localhost:8081, use Tauri's sidecar pattern: ship the compiled Go binary alongside the app, Tauri starts it as a subprocess on launch
The sidecar approach makes the merchant dashboard a fully self-contained installer. Double-click, it starts its own Go backend, connects to local Postgres, opens the UI. No server required. For small merchants who don't want to manage cloud infrastructure, that's genuinely useful.
npm install @tauri-apps/cli @tauri-apps/api
npx tauri init
npx tauri dev # React dev server + Tauri window
npx tauri build # produces .exe/.dmg/.AppImage installer
What I Actually Learned
Some of this I knew conceptually before. Building it made it real.
Integer cents for money — I thought floats were "good enough" for most things and you'd just round. Then I saw 0.1 + 0.2 = 0.30000000000000004 and read about how this compounds at scale. Now I can't imagine using floats for anything financial.
SELECT FOR UPDATE — I had read about race conditions but always thought "my app won't have that much traffic." Realizing that two simultaneous requests is enough to trigger the double-payment bug, not "millions of users," made it click. The lock isn't about scale. It's about correctness.
JWT and statelessness — "No DB lookup on every request" sounds like a performance optimization. But at microservices scale, it's about not requiring a shared session store that every service depends on. The architecture consequence is what matters.
Elasticsearch's inverted index — The book analogy made this immediate for me. You don't scan every page to find a word; you look in the index. Once I understood that ES builds that index at write time, everything else — why it's fast, why writes are slightly slower, why you need to keep it synced — followed logically.
Multi-stage Docker builds — Before this project, I was shipping 400MB images with compilers in them. Multi-stage builds are one of those things where once you learn them you can't go back. The compiler has no business being in a production image.
text vs keyword in Elasticsearch — This confused me longer than it should have. text = tokenized, searchable, not sortable. keyword = exact, sortable, filterable. Get this wrong and your search doesn't work; get it right and it just works.
Debounce with useRef — I originally used useState for the timer reference and couldn't figure out why my component was re-rendering on every keypress. Switching to useRef and understanding why — changing a ref doesn't trigger re-renders — was one of those small React lessons that changes how you think about the hook model.
This was the most technically dense thing I've built so far. Four phases, two languages, three data stores, Docker orchestration. Every decision in here was made for a reason, and having to articulate those reasons — even just to myself while writing the code — made the decisions stick in a way that reading about them didn't.
If you're building something similar or have questions about any of this, I'd love to hear from you.